Model Selection

Multimodal Visual Representation

# Multimodal Visual Representation

Webssl Mae1b Full2b 224

A 1-billion-parameter Vision Transformer model trained via masked autoencoder self-supervised learning on 2 billion web images, capable of learning visual representations without language supervision.

Image Classification

RADIO is a vision foundation model developed by NVIDIA Research, capable of unifying visual information across different domains for various vision tasks.

Image Segmentation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase